Question 1

a) For each categorical predictor, generate a bar chart that shows the odds of Churn for each category. Please order the categories in ascending odds of Churn. Also, please comment on each categorical predictor on whether it may affect the target variable.

def convert_object_to_category(df): for i in catName: df[i] = df[i].astype("category") freq = df[i].value_counts(ascending = True) one_pm = df[i].cat.reorder_categories(list(freq.index)) series.append(one_pm)
return pd.concat(series,axis=1)

Categorical variables of SeniorCitizen, Partner, Dependents, Contract, Paperless will may affect on target variable. Gender, Phone Service, Mutiplelines won't affect much since they have similar odds of churn among each its categoires.

b) For each interval predictor, generate a grouped boxplot that shows the distribution of the interval predictor. The grouping variable, in this case, is the target variable. Also, please comment on each interval predictor on whether it may affect the target variable.

Tenure may have the largest effect on odds of churn, while monthlycharges and total charges may have effect but not as much as tenure.

Question 2

a) Please provide a summary report of the Forward Selection. The report should include (1) the step number, (2) the predictor entered, (3) the number of non-aliased parameters in the current model, (4) the log-likelihood value of the current model, (5) the Deviance Chi-squares statistic between the current and the previous models, (6) the corresponding Deviance Degree of Freedom, and (7) the corresponding Chi-square significance.

b) Please show a table of the complete set of parameters of your final model (including the aliased parameters). Besides the parameter estimates, please also include the standard errors, and the 95% asymptotic confidence intervals. Conventionally, aliased parameters have missing standard errors and confidence intervals.

Quesetion 3

a) Please calculate the McFadden’s R-squared, the Cox-Snell’s R-squared, the Nagelkerke’s R-squared, and the Tjur’s Coefficient of Discrimination

b) Please calculate the Area Under Curve statistic and the Root Average Squared Error.

c) According to the F1 Score, please suggest the probability threshold for Churn. Using this threshold, what is the misclassification rate?